AITopics | steinwart and christmann

Collaborating Authors

steinwart and christmann

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Out-of-Distribution generalization of quantile regression with heavy tailed inputs: an SVM approach

Leroux, Baptiste, Dombry, Clément, Sabourin, Anne

arXiv.org Machine LearningJun-2-2026

We study quantile regression in an extrapolation regime where the covariate takes unusually large values. Under regular variation assumptions, extreme observations can be effectively characterized through their angular components, enabling learning strategies that focus on the angle of the most extreme observations. This approach is formalized through the minimization of an asymptotic conditional risk that localizes learning in the tail of the covariate distribution. We propose a novel Support Vector Machine (SVM) framework for extreme quantile regression, leveraging reproducing kernel Hilbert spaces to handle high-dimensional and nonlinear settings. Our method also accommodates unbounded response variables and avoids restrictive transformations. We establish finite-sample learning guarantees under mild regularity assumptions. The proposed framework unifies ideas from statistical learning and multivariate extremes, providing a tractable and theoretically grounded approach to extrapolation. We complement our theoretical findings with an empirical study on river flow data from the Danube, demonstrating the practical relevance of our methods.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Machine Learning

2606.00265

Country: Europe (0.28)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

On the Robustness of Kernel Ridge Regression Using the Cauchy Loss Function

Wen, Hongwei, Betken, Annika, Koolen, Wouter

arXiv.org Machine LearningMar-25-2025

Robust regression aims to develop methods for estimating an unknown regression function in the presence of outliers, heavy-tailed distributions, or contaminated data, which can severely impact performance. Most existing theoretical results in robust regression assume that the noise has a finite absolute mean, an assumption violated by certain distributions, such as Cauchy and some Pareto noise. In this paper, we introduce a generalized Cauchy noise framework that accommodates all noise distributions with finite moments of any order, even when the absolute mean is infinite. Within this framework, we study the \textit{kernel Cauchy ridge regressor} (\textit{KCRR}), which minimizes a regularized empirical Cauchy risk to achieve robustness. To derive the $L_2$-risk bound for KCRR, we establish a connection between the excess Cauchy risk and $L_2$-risk for sufficiently large scale parameters of the Cauchy loss, which reveals that these two risks are equivalent. Furthermore, under the assumption that the regression function satisfies H\"older smoothness, we derive excess Cauchy risk bounds for KCRR, showing improved performance as the scale parameter decreases. By considering the twofold effect of the scale parameter on the excess Cauchy risk and its equivalence with the $L_2$-risk, we establish the almost minimax-optimal convergence rate for KCRR in terms of $L_2$-risk, highlighting the robustness of the Cauchy loss in handling various types of noise. Finally, we validate the effectiveness of KCRR through experiments on both synthetic and real-world datasets under diverse noise corruption scenarios.

artificial intelligence, cauchy loss, machine learning, (17 more...)

arXiv.org Machine Learning

2503.2012

Country:

North America > United States (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Density-Calibrated Conformal Quantile Regression

Lu, Yuan

arXiv.org Machine LearningDec-2-2024

This paper introduces the Density-Calibrated Conformal Quantile Regression (CQR-d) method, a novel approach for constructing prediction intervals that adapts to varying uncertainty across the feature space. Building upon conformal quantile regression, CQR-d incorporates local information through a weighted combination of local and global conformity scores, where the weights are determined by local data density. We prove that CQR-d provides valid marginal coverage at level $1 - \alpha - \epsilon$, where $\epsilon$ represents a small tolerance from numerical optimization. Through extensive simulation studies and an application to the a heteroscedastic dataset available in R, we demonstrate that CQR-d maintains the desired coverage while producing substantially narrower prediction intervals compared to standard conformal quantile regression (CQR). The method's effectiveness is particularly pronounced in settings with clear local uncertainty patterns, making it a valuable tool for prediction tasks in heterogeneous data environments.

cqr-d, density-calibrated conformal quantile regression, prediction interval, (8 more...)

arXiv.org Machine Learning

2411.19523

Country: North America > United States > Michigan (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Lp- and Risk Consistency of Localized SVMs

Köhler, Hannes

arXiv.org Artificial IntelligenceMay-16-2023

Kernel-based regularized risk minimizers, also called support vector machines (SVMs), are known to possess many desirable properties but suffer from their super-linear computational requirements when dealing with large data sets. This problem can be tackled by using localized SVMs instead, which also offer the additional advantage of being able to apply different hyperparameters to different regions of the input space. In this paper, localized SVMs are analyzed with regards to their consistency. It is proven that they inherit $L_p$- as well as risk consistency from global SVMs under very weak conditions and even if the regions underlying the localized SVMs are allowed to change as the size of the training data set increases.

artificial intelligence, localized svm, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.09385

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Education (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

On the Connection between $L_p$ and Risk Consistency and its Implications on Regularized Kernel Methods

Köhler, Hannes

arXiv.org Artificial IntelligenceMar-27-2023

As a predictor's quality is often assessed by means of its risk, it is natural to regard risk consistency as a desirable property of learning methods, and many such methods have indeed been shown to be risk consistent. The first aim of this paper is to establish the close connection between risk consistency and $L_p$-consistency for a considerably wider class of loss functions than has been done before. The attempt to transfer this connection to shifted loss functions surprisingly reveals that this shift does not reduce the assumptions needed on the underlying probability measure to the same extent as it does for many other results. The results are applied to regularized kernel methods such as support vector machines.

artificial intelligence, loss function, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.1521

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Kernel-based L_2-Boosting with Structure Constraints

Wang, Yao, Guo, Xin, Lin, Shao-Bo

arXiv.org Machine LearningSep-16-2020

Developing efficient kernel methods for regression is very popular in the past decade. In this paper, utilizing boosting on kernel-based weaker learners, we propose a novel kernel-based learning algorithm called kernel-based re-scaled boosting with truncation, dubbed as KReBooT. The proposed KReBooT benefits in controlling the structure of estimators and producing sparse estimate, and is near overfitting resistant. We conduct both theoretical analysis and numerical simulations to illustrate the power of KReBooT. Theoretically, we prove that KReBooT can achieve the almost optimal numerical convergence rate for nonlinear approximation. Furthermore, using the recently developed integral operator approach and a variant of Talagrand's concentration inequality, we provide fast learning rates for KReBooT, which is a new record of boosting-type algorithms. Numerically, we carry out a series of simulations to show the promising performance of KReBooT in terms of its good generalization, near over-fitting resistance and structure constraints.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2009.07558

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Large-scale Kernel Methods and Applications to Lifelong Robot Learning

Camoriano, Raffaello

arXiv.org Machine LearningDec-11-2019

As the size and richness of available datasets grow larger, the opportunities for solving increasingly challenging problems with algorithms learning directly from data grow at the same pace. Consequently, the capability of learning algorithms to work with large amounts of data has become a crucial scientific and technological challenge for their practical applicability. Hence, it is no surprise that large-scale learning is currently drawing plenty of research effort in the machine learning research community. In this thesis, we focus on kernel methods, a theoretically sound and effective class of learning algorithms yielding nonparametric estimators. Kernel methods, in their classical formulations, are accurate and efficient on datasets of limited size, but do not scale up in a cost-effective manner. Recent research has shown that approximate learning algorithms, for instance random subsampling methods like Nystr\"om and random features, with time-memory-accuracy trade-off mechanisms are more scalable alternatives. In this thesis, we provide analyses of the generalization properties and computational requirements of several types of such approximation schemes. In particular, we expose the tight relationship between statistics and computations, with the goal of tailoring the accuracy of the learning process to the available computational resources. Our results are supported by experimental evidence on large-scale datasets and numerical simulations. We also study how large-scale learning can be applied to enable accurate, efficient, and reactive lifelong learning for robotics. In particular, we propose algorithms allowing robots to learn continuously from experience and adapt to changes in their operational environment. The proposed methods are validated on the iCub humanoid robot in addition to other benchmarks.

classification, regularization parameter, steinwart and christmann, (16 more...)

arXiv.org Machine Learning

1912.05629

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)
(4 more...)

Add feedback

Histogram Transform Ensembles for Large-scale Regression

Hang, Hanyuan, Lin, Zhouchen, Liu, Xiaoyu, Wen, Hongwei

arXiv.org Machine LearningDec-8-2019

We propose a novel algorithm for large-scale regression problems named histogram transform ensembles (HTE), composed of random rotations, stretchings, and translations. First of all, we investigate the theoretical properties of HTE when the regression function lies in the H\"{o}lder space $C^{k,\alpha}$, $k \in \mathbb{N}_0$, $\alpha \in (0,1]$. In the case that $k=0, 1$, we adopt the constant regressors and develop the na\"{i}ve histogram transforms (NHT). Within the space $C^{0,\alpha}$, although almost optimal convergence rates can be derived for both single and ensemble NHT, we fail to show the benefits of ensembles over single estimators theoretically. In contrast, in the subspace $C^{1,\alpha}$, we prove that if $d \geq 2(1+\alpha)/\alpha$, the lower bound of the convergence rates for single NHT turns out to be worse than the upper bound of the convergence rates for ensemble NHT. In the other case when $k \geq 2$, the NHT may no longer be appropriate in predicting smoother regression functions. Instead, we apply kernel histogram transforms (KHT) equipped with smoother regressors such as support vector machines (SVMs), and it turns out that both single and ensemble KHT enjoy almost optimal convergence rates. Then we validate the above theoretical results by numerical experiments. On the one hand, simulations are conducted to elucidate that ensemble NHT outperform single NHT. On the other hand, the effects of bin sizes on accuracy of both NHT and KHT also accord with theoretical analysis. Last but not least, in the real-data experiments, comparisons between the ensemble KHT, equipped with adaptive histogram transforms, and other state-of-the-art large-scale regression estimators verify the effectiveness and accuracy of our algorithm.

convergence rate, ensemble, steinwart and christmann, (13 more...)

arXiv.org Machine Learning

1912.04738

Country:

North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Best-scored Random Forest Classification

Hang, Hanyuan, Liu, Xiaoyu, Steinwart, Ingo

arXiv.org Machine LearningMay-27-2019

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspective, within the framework of regularized empirical risk minimization penalized on the number of splits, we establish almost optimal convergence rates for the proposed best-scored random trees under certain conditions which can be extended to the best-scored random forest. In addition, we present a counterexample to illustrate that in order to ensure the consistency of the forest, every dimension must have the chance to be split. In the numerical experiments, for the sake of efficiency, we employ an adaptive random splitting criterion. Comparative experiments with other state-of-art classification methods demonstrate the accuracy of our best-scored random forest.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

1905.11028

Country:

North America > United States > New York (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Kernel Machines With Missing Responses

Liu, Tiantian, Goldberg, Yair

arXiv.org Machine LearningJun-7-2018

Missing responses is a missing data format in which outcomes are not always observed. In this work we develop kernel machines that can handle missing responses. First, we propose a kernel machine family that uses mainly the complete cases. For the quadratic loss, we then propose a family of doubly-robust kernel machines. The proposed kernel-machine estimators can be applied to both regression and classification problems. We prove oracle inequalities for the finite-sample differences between the kernel machine risk and Bayes risk. We use these oracle inequalities to prove consistency and to calculate convergence rates. We demonstrate the performance of the two proposed kernel machine families using both a simulation study and a real-world data analysis.

artificial intelligence, kernel machine, machine learning, (16 more...)

arXiv.org Machine Learning

1806.02865

Country: North America > United States > California > Los Angeles County (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback